27 research outputs found

    On Objective Measures of Rule Surprisingness

    Get PDF
    Most of the literature argues that surprisingness is an inherently subjective aspect of the discovered knowledge, which cannot be measured in objective terms. This paper departs from this view, and it has a twofold goal: (1) showing that it is indeed possible to define objective (rather than subjective) measures of discovered rule surprisingness; (2) proposing new ideas and methods for defining objective rule surprisingness measures

    Inducing safer oblique trees without costs

    Get PDF
    Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming

    Inter-comparison of the g-, f- and p-modes calculated using different oscillation codes for a given stellar model

    Full text link
    In order to make astroseismology a powerful tool to explore stellar interiors, different numerical codes should give the same oscillation frequencies for the same input physics. This work is devoted to test, compare and, if needed, optimize the seismic codes used to calculate the eigenfrequencies to be finally compared with observations. The oscillation codes of nine research groups in the field have been used in this study. The same physics has been imposed for all the codes in order to isolate the non-physical dependence of any possible difference. Two equilibrium models with different grids, 2172 and 4042 mesh points, have been used, and the latter model includes an explicit modelling of semiconvection just outside the convective core. Comparing the results for these two models illustrates the effect of the number of mesh points and their distribution in particularly critical parts of the model, such as the steep composition gradient outside the convective core. A comprehensive study of the frequency differences found for the different codes is given as well. These differences are mainly due to the use of different numerical integration schemes. The use of a second-order integration scheme plus a Richardson extrapolation provides similar results to a fourth-order integration scheme. The proper numerical description of the Brunt-Vaisala frequency in the equilibrium model is also critical for some modes. An unexpected result of this study is the high sensitivity of the frequency differences to the inconsistent use of values of the gravitational constant (G) in the oscillation codes, within the range of the experimentally determined ones, which differ from the value used to compute the equilibrium model.Comment: 18 pages, 34 figure

    Perspectives in Global Helioseismology, and the Road Ahead

    Get PDF
    We review the impact of global helioseismology on key questions concerning the internal structure and dynamics of the Sun, and consider the exciting challenges the field faces as it enters a fourth decade of science exploitation. We do so with an eye on the past, looking at the perspectives global helioseismology offered in its earlier phases, in particular the mid-to-late 1970s and the 1980s. We look at how modern, higher-quality, longer datasets coupled with new developments in analysis, have altered, refined, and changed some of those perspectives, and opened others that were not previously available for study. We finish by discussing outstanding challenges and questions for the field.Comment: Invited review; to appear in Solar Physics (24 pages, 6 figures

    EviRank: An Evidence Based Content Trust Model for Web Spam Detection

    No full text

    Reinventing Machine Learning with ROC Analysis

    No full text

    Learning Compact Markov Logic Networks With Decision Trees

    No full text
    Statistical-relational learning combines logical syntax with probabilistic methods. Markov Logic Networks (MLNs) are a prominent model class that generalizes both first-order logic and undirected graphical models (Markov networks). The qualitative component of an MLN is a set of clauses and the quantitative component is a set of clause weights. Generative MLNs model the joint distribution of relationships and attributes. A state-of-the-art structure learning method is the moralization approach: learn a set of directed Horn clauses, then convert them to conjunctions to obtain MLN clauses. The directed clauses are learned using Bayes net methods. The moralization approach takes advantage of the high-quality inference algorithms for MLNs and their ability to handle cyclic dependencies. A weakness of moralization is that it leads to an unnecessarily large number of clauses. In this paper we show that using decision trees to represent conditional probabilities in the Bayes net is an effective remedy that leads to much more compact MLN structures. In experiments on benchmark datasets, the decision trees reduce the number of clauses in the moralized MLN by a factor of 5-25, depending on the dataset. The accuracy of predictions is competitive with the models obtained by standard moralization, and in many cases superior

    A Study with Class Imbalance and Random Sampling for a Decision Tree Learning System

    No full text
    corecore